The Data Set explores AirBnB Prices in Austin.
sessionInfo(package=NULL)
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyr_1.8.4 maps_3.1.1 readr_1.1.0 data.world_0.1.2
[5] dplyr_0.5.0 plotly_4.6.0 ggplot2_2.2.1 leaflet_1.1.0
[9] DT_0.2 shinydashboard_0.5.3 shiny_1.0.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 base64enc_0.1-3 tools_3.3.2 digest_0.6.11
[5] evaluate_0.10 jsonlite_1.3 tibble_1.3.0 gtable_0.2.0
[9] viridisLite_0.2.0 DBI_0.5-1 crosstalk_1.0.0 curl_2.4
[13] yaml_2.1.14 knitr_1.15.1 stringr_1.1.0 httr_1.2.1
[17] htmlwidgets_0.8 hms_0.3 rprojroot_1.2 grid_3.3.2
[21] R6_2.2.0 rmarkdown_1.3 purrr_0.2.2 tidyr_0.6.1
[25] magrittr_1.5 backports_1.0.5 scales_0.4.1 htmltools_0.3.5
[29] rsconnect_0.7 assertthat_0.1 mime_0.5 xtable_1.8-2
[33] colorspace_1.3-2 httpuv_1.3.3 labeling_0.3 stringi_1.1.2
[37] lazyeval_0.2.0 munsell_0.4.3
setwd(dir = "../00 Doc/")
source("../01 Data/ETL_listings.R")
Parsed with column specification:
cols(
id = col_integer(),
name = col_character(),
host_id = col_integer(),
host_name = col_character(),
neighbourhood_group = col_character(),
neighbourhood = col_integer(),
latitude = col_double(),
longitude = col_double(),
room_type = col_character(),
price = col_integer(),
minimum_nights = col_integer(),
number_of_reviews = col_integer(),
last_review = col_date(format = ""),
reviews_per_month = col_double(),
calculated_host_listings_count = col_integer(),
availability_365 = col_integer()
)
The following `from` values were not present in `x`: table
The following `from` values were not present in `x`: table
invalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generated
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5835 obs. of 16 variables:
$ id : Factor w/ 5835 levels "1001400","1002993",..: 3982 2724 5377 5383 4967 4990 3590 4897 5154 2576 ...
$ name : Factor w/ 5780 levels "------Comfy 3/2 Home------",..: 419 2138 4123 4124 1225 4118 4521 2044 4484 4781 ...
$ host_id : Factor w/ 4633 levels "10010707","10015316",..: 2915 2126 3621 3621 2522 3474 2627 169 3538 1232 ...
$ host_name : Factor w/ 1827 levels "(email hidden)",..: 804 1268 1479 1479 224 NA 831 1427 177 NA ...
$ neighbourhood_group : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
$ neighbourhood : Factor w/ 41 levels "78701","78702",..: 25 25 25 25 25 25 25 25 25 25 ...
$ latitude : Factor w/ 5835 levels "30.1305163587544",..: 266 225 62 64 65 126 130 109 142 68 ...
$ longitude : Factor w/ 5835 levels "-5.09368239448111",..: 5739 5765 5692 5695 5746 5771 5791 5752 5699 5703 ...
$ room_type : Factor w/ 3 levels "Entire home/apt",..: 2 2 2 2 1 2 2 2 2 2 ...
$ price : Factor w/ 468 levels "0","100","1000",..: 228 464 2 2 362 2 341 283 48 302 ...
$ minimum_nights : Factor w/ 26 levels "1","10","13",..: 6 14 1 1 6 1 1 1 1 1 ...
$ number_of_reviews : Factor w/ 163 levels "0","1","10","100",..: 2 1 1 1 1 1 1 66 1 67 ...
$ last_review : Factor w/ 330 levels "2011-03-21","2012-01-26",..: 1 NA NA NA NA NA NA 324 NA 305 ...
$ reviews_per_month : Factor w/ 566 levels "0.02","0.03",..: 1 NA NA NA NA NA NA 203 NA 252 ...
$ calculated_host_listings_count: Factor w/ 14 levels "1","10","11",..: 1 1 7 7 7 1 1 1 1 1 ...
$ availability_365 : Factor w/ 362 levels "0","1","10","100",..: 155 292 293 293 285 293 223 235 282 292 ...
- attr(*, "spec")=List of 2
..$ cols :List of 16
.. ..$ id : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ name : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ host_id : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ host_name : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ neighbourhood_group : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ neighbourhood : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ latitude : list()
.. .. ..- attr(*, "class")= chr "collector_double" "collector"
.. ..$ longitude : list()
.. .. ..- attr(*, "class")= chr "collector_double" "collector"
.. ..$ room_type : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ price : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ minimum_nights : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ number_of_reviews : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ last_review :List of 1
.. .. ..$ format: chr ""
.. .. ..- attr(*, "class")= chr "collector_date" "collector"
.. ..$ reviews_per_month : list()
.. .. ..- attr(*, "class")= chr "collector_double" "collector"
.. ..$ calculated_host_listings_count: list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ availability_365 : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
..$ default: list()
.. ..- attr(*, "class")= chr "collector_guess" "collector"
..- attr(*, "class")= chr "col_spec"
summary(df)
id name host_id host_name
1001400: 1 East Austin Bungalow : 6 4641823 : 127 Erica : 135
1002993: 1 East Austin Home : 4 31148752: 42 John : 56
1003316: 1 Euro Hostel/Dorm Style by Downtown: 4 8167447 : 26 Sarah : 51
1003775: 1 Charming East Austin Bungalow : 3 579290 : 18 Michael: 49
1009806: 1 East Austin Charmer : 3 1488733 : 17 Ryan : 45
1011620: 1 1 bedroom with private bath : 2 1568741 : 15 (Other):5206
(Other):5829 (Other) :5813 (Other) :5590 NA's : 293
neighbourhood_group neighbourhood latitude longitude
NA's:5835 78704 :1601 30.1305163587544: 1 -5.09368239448111: 1
78702 : 797 30.1399214304874: 1 -97.5670481812431: 1
78703 : 419 30.1406878366631: 1 -97.586245659854 : 1
78741 : 414 30.1415093891488: 1 -97.5877635840428: 1
78745 : 328 30.1423321194423: 1 -97.6136226010006: 1
78751 : 251 30.1424347881363: 1 -97.6180279594552: 1
(Other):2025 (Other) :5829 (Other) :5829
room_type price minimum_nights number_of_reviews last_review
Entire home/apt:4060 150 : 253 1 :2776 0 :2006 2015-10-26: 260
Private room :1652 200 : 227 2 :2005 1 : 698 2015-10-12: 215
Shared room : 123 250 : 203 3 : 690 2 : 452 2015-10-25: 187
300 : 181 4 : 147 3 : 302 2015-11-02: 177
100 : 175 5 : 88 4 : 275 2015-10-27: 166
125 : 156 7 : 43 5 : 186 (Other) :2824
(Other):4640 (Other): 86 (Other):1916 NA's :2006
reviews_per_month calculated_host_listings_count availability_365
1 : 211 1 :4169 365 :1236
0.13 : 194 2 : 823 364 : 268
0.25 : 79 3 : 232 363 : 203
2 : 77 7 : 169 0 : 92
0.05 : 65 4 : 152 362 : 80
(Other):3201 6 : 62 361 : 71
NA's :2008 (Other): 228 (Other):3885
## id name host_id
## 1001400: 1 East Austin Bungalow : 6 4641823 : 127
## 1002993: 1 East Austin Home : 4 31148752: 42
## 1003316: 1 Euro Hostel/Dorm Style by Downtown: 4 8167447 : 26
## 1003775: 1 Charming East Austin Bungalow : 3 579290 : 18
## 1009806: 1 East Austin Charmer : 3 1488733 : 17
## 1011620: 1 1 bedroom with private bath : 2 1568741 : 15
## (Other):5829 (Other) :5813 (Other) :5590
## host_name neighbourhood_group neighbourhood latitude
## Erica : 135 NA's:5835 78704 :1601 30.1305163587544: 1
## John : 56 78702 : 797 30.1399214304874: 1
## Sarah : 51 78703 : 419 30.1406878366631: 1
## Michael: 49 78741 : 414 30.1415093891488: 1
## Ryan : 45 78745 : 328 30.1423321194423: 1
## (Other):5206 78751 : 251 30.1424347881363: 1
## NA's : 293 (Other):2025 (Other) :5829
## longitude room_type price
## -5.09368239448111: 1 Entire home/apt:4060 150 : 253
## -97.5670481812431: 1 Private room :1652 200 : 227
## -97.586245659854 : 1 Shared room : 123 250 : 203
## -97.5877635840428: 1 300 : 181
## -97.6136226010006: 1 100 : 175
## -97.6180279594552: 1 125 : 156
## (Other) :5829 (Other):4640
## minimum_nights number_of_reviews last_review reviews_per_month
## 1 :2776 0 :2006 2015-10-26: 260 1 : 211
## 2 :2005 1 : 698 2015-10-12: 215 0.13 : 194
## 3 : 690 2 : 452 2015-10-25: 187 0.25 : 79
## 4 : 147 3 : 302 2015-11-02: 177 2 : 77
## 5 : 88 4 : 275 2015-10-27: 166 0.05 : 65
## 7 : 43 5 : 186 (Other) :2824 (Other):3201
## (Other): 86 (Other):1916 NA's :2006 NA's :2008
## calculated_host_listings_count availability_365
## 1 :4169 365 :1236
## 2 : 823 364 : 268
## 3 : 232 363 : 203
## 7 : 169 0 : 92
## 4 : 152 362 : 80
## 6 : 62 361 : 71
## (Other): 228 (Other):3885
Download the AirBnb data from data.world, search for “KurtAKranz” and download the dataset named “S17 DV Final Project.”
We ran a ETL script that standardized the data from “01 Data” folder. It standardized the colomn data to be consistent among all data points.
From that, we used Tableau’s Data connector to connect to our data set in data.world.
We created 6 different visualizations using Boxplots, Scatter Plots, Histrograms, Crosstabs, and Barcharts.
Our first visualization, we showed a crosstab using zip code and room type, with a key performance idicator of average price. We also created parameters that effectively let one select a price range for acceptable locations.
Our second visualization, we showed each zip code’s deviation from the average price among all zip codes. From this, we created a set of the top 5 most pricey zip codes.
Our third visualization, we showed the average price of each room type, grouped by zipcode. For each room type, we had a reference line showing the average price of each room type.
Our fourth visualization, we created a histogram using the average price to convey that the data is skewed right. This means that using the mean to calculate the average is not going to be as accurate as using the median.
Our fifth visualization was Population Vs Average Price where we created a join between the census data and then created a scatter plot with neighbourhood as the coloumns and AVG(Population) and AGG(Average Price) as the rows. We then added a trend line to convey the data’s negetive trend.
Our sixth and final visualization, we created a boxplot for the prices of the top 5 most pricey zip codes. This showed the outliars in the data, showing again that the median is the better way to express the average.
For our shiny visualizations, we created data frames for each graph we had. We made specifc queries for whatever data we needed. Then we used ggplot to take our data frame and create a ggplot object. From that, we used plotly to express the ggplot data with a more interactive display.
This shows the releative pricing for each zip code compared to the overall average. This enables comparision for pricing to see if you are paying above the market average.
Histogram using the average price to convey that the data is skewed right. This means that using the mean to calculate the average is not going to be as accurate as using the median.
Scatter Plot created using a join between our data and the census data. The trend line’s negative slope indicated that the average price of a zip code decreases as the population of the zip code increases.
Boxplot for the prices of the top 5 most pricey zip codes. This showed the outliars in the data, showing again that the median is the better way to express the average.